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METHOD FOR EXTRACTING INFORMATION FRQM_A DATABASE 

Technical field 

The present invention relates to a method for 
extracting information from a database. The database 
5 comprises a number of data tables containing values of a 
number of variables, each data table consisting of at 
least one data record including at least two of said 
values . The information is extracted by evaluation of at 
least one mathematical function, which operates on one or 
10 more selected calculation variables. Further, the extrac- 
ted information is partitioned on one or more selected 
classification variables. 
Background of the invention 



15 from a database that is stored on a secondary memory of a 
computer. More specifically, there is need to summarise a 
large amount of data in the database, and present the 
summarised data to a user in a lucid way. For example, a 
user might be interested in extracting total sales per 

2 0 year and client from a database including transaction 

data for a large company. Thus, the extraction involves 
evaluation of a mathematical function, e.g. a summation 
( "SUM (x*y) " ) / operating on a combination of calculation 
variables (x, y) , e.g. the number of sold items 

25 ("Number") and the price per item ("Price") . The extrac- 
tion also involves partitioning the information according 
to classification variables, e.g. "Year" and "Client". 
Thus, the classification variables define how the result 
of the mathematical operation should be presented. In 

30 this specific case, the extraction of the total sales per 
year by client would involve evaluation of 
"SUM (Number*Price) per Year, Client". 

In one prior-art solution, a computer program is 
designed to process the database and to evaluate all 



It is often desired to extract specific information 



I 
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conceivable mathematical functions operating on all 
conceivable calculation variables partitioned on all 
conceivable classification variables, also called 
dimensions. The result of this operation is a large data 
5 structure commonly known as a multidimensional cube. This 
multidimensional cube is obtained through a very time- 
consuming operation, which typically is performed over- 
night . The cube contains the evaluated results of the 
mathematical functions for every unique combination of 

10 the occurring values of the classification variables. The 
user can then, in a different computer program operating 
on the multidimensional cube, explore the data of the 
database, for example by visualising selected data in 
pivot tables or graphically in 2D and 3D charts. When the 

15 user defines a mathematical function and one or more 
classification variables, all other classification 
variables are eliminated through a summation over the 
results stored in the cube for this mathematical func- 
tion, the summation being made for all other classifi- 

2 0 cation variables. Thus, by adding or removing classifi- 
cation variables, the user can move up or down in the 
dimensions of the cube. 

This approach has some undesired limitations. If the 
multidimensional cube after evaluation contains average 

25 quantities, e.g. the average sales partitioned on a 
number of classification variables, the user cannot 
eliminate one or more of these classification variables 
since a summation over average quantities does not yield 
a correct total average. In this case, the multidimen- 

30 sional cube must contain the average quantity split on 

every conceivable combination of classification variables 
as well, adding an extra complexity to the operation of 
building the multidimensional cube. The same problem 
arises for other quantities, e.g. median values. 

35 Often it is difficult to predict all relevant mathe- 

matical functions, calculation variables and classifica-. 
tion variables before making a first examination of the 
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data in the database. Upon identifying trends and 
patterns, the user might find a need to add a function or 
a variable to reach underlying details in the data. Then, 
the time-consuming procedure of building a new multi- 
5 dimensional cube must be initiated. 
Summary of the invention 

Accordingly, the object of the present invention is 
to mitigate the above-mentioned drawbacks and to provide 
a method for extracting information from a database, 
10 which method allows the user to freely select mathemati- 
cal functions and incorporate calculation variables in 
these functions as well as to freely select classifica- 
tion variables for presentation of the results. 

This object is achieved by a method having the 
15 features recited in independent claim 1. Preferred 
embodiments are recited in the dependent claims. 

According to the present invention there is provided 
a method for generating a final data structure, i.e. a 
multidimensional cube, from data in a database in an 
20 efficient way, with respect to both process time and 

memory requirement. Since the cube can be generated much 
faster than in prior-art solutions, it is possible to 
generate multidimensional cubes ad hoc. The user can 
interactively define and generate a cube without being 
25 limited to a very small number of mathematical functions 
and variables. The mathematical function is normally 
composed of a combination of mathematical expressions. If 
the user needs to modify the mathematical function by 
changing, adding or removing a mathematical expression, a 
3 0 new cube can normally be generated in a time short enough 
not to disturb the user in his work. Similarly, if the 
user desires to add or remove a variable, the cube can be 
rapidly regenerated . 

This is achieved by a clever grouping of all rele- 
3 5 vant data tables into boundary tables and connecting 
tables, respectively, based on the type of variables 
included in each table. By electing one of these tables 
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as a starting point and by building an appropriate 
conversion structure, the final data structure can be 
efficiently generated from the starting table by use of 
the conversion structure . 
5 Preferably, the data records of the database are 

first read into the primary memory of a computer so that 
the data can be processed off-line. This will further 
reduce the time for searching the database and generating 
the final data structure. The database may be stored on a 

10 secondary memory or be a remotely stored database to 

which the computer is connected by a modem. It is to be 
understood that the database thus read into the primary 
memory may be a selected part of a larger database or a 
combination of two or more databases. 

15 In one preferred embodiment, each different value of 

each data variable is assigned a binary code and the data 
records are stored in binary- coded form. On account of 
the binary coding, very rapid searches can be conducted 
in the data tables. Moreover, redundant information can 

20 be removed, resulting in a reduced amount of data. 

In another preferred embodiment, all boundary and 
connecting tables that include calculation variables with 
a need for frequency data, i.e. variables for which the 
number of duplicates of each value is necessary for 

25 correct evaluation of the mathematical function, define a 
subset. By electing the starting table from this siibset 
and by including frequency data in the conversion struc- 
ture, memory-efficient storage of duplicates can be 
achieved when building the final data structure. 

30 In the conversion structure, the frequency data can 

be included by duplication of each value, i.e. the con- 
version structure will contain a link from each value of 
a connecting variable in the starting table to a correct 
number of each value of at least one corresponding 

35 selected variable in a boundary table. Alternatively, a 
counter may be included in the conversion structure for 
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each unique value of each connecting variable in the 
starting table. 

Preferably, the boundary or connecting table having 
the largest number of data records is elected as starting 
5 table. This tends to minimise the amount of frequency 
data that must be incorporated in the conversion struc- 
ture, which therefore can be built more rapidly. 

In a further preferred embodiment, a virtual data 
record is created by reading a data record of the 

10 starting table and by using the conversion structure to 
convert each value of each connecting variable in this 
data record into a value of at least one corresponding 
selected variable. Thereby, the virtual data record will 
contain a current combination of values of the selected 

15 variables. The final data structure can be gradually 
built by sequentially reading data records from the 
starting table, by updating the content of the virtual 
data record based on the content of each such data 
record, and by evaluating the mathematical function based 

2 0 on the content of each such updated virtual data record. 

This procedure minimises the amount of computer memory 
that is needed for extracting the requested information 
from the database. Further, virtual data records contain- 
ing undefined values, so-called NULL values, of any 
25 calculation variable can often be immediately removed, in 
particular when all calculation variables exhibit NULL- 
values, since in many cases such values can not be used 
in the evaluation of the mathematical function. This 
feature will contribute to an optimised performance. 

3 0 In another preferred embodiment, an intermediate 

data structure is built based on the content of the 
virtual data record. Each data record of the intermediate 
data structure contains a field for each selected classi- 
fication variable and an aggregation field for each 
3 5 mathematical expression included in the mathematical 
function. For each updated virtual data record, each 
mathematical expression is evaluated and the result is 
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aggregated in the appropriate aggregation field based on 
the current value of each selected classification 
variable. Such an intermediate data structure allows the 
user to combine mathematical expressions with different 
5 need for frequency data in one mathematical function. By 
building several conversion structures incorporating 
corresponding frequency data, and by evaluating the data 
records of a starting table for each such mathematical 
expression based on a corresponding conversion structure, 
10 it is possible to merge the results of these evaluations 
in one intermediate data structure. Likewise, if the user 
modifies the mathematical function by adding a new mathe- 
matical expression operating on the already selected cal- 
culation variables, it is only necessary to add an 
15 aggregation field to the existing intermediate data 

structure, or to extend an existing aggregation field. 

It should be noted that the virtual data record in 
general is indeed virtual, i.e. it is not physically 
allocated any memory, during the transition from a data 
20 record of the starting table to the final data structure. 
However, such a virtual data record can always, at least 
implicitly, be identified in the procedure of converting 
the content of a data record of the starting table into 
current values of the selected variables. 
Description of preferred embodiments 

Tke present invention will now be described by way 
of examples, reference being made to the tables of 
AppendisA A and to Figs 1-2 of the drawings. Fig. 1 
showing t^he content of a database after identification of 
relevant ^iata tables according to the inventive method, 
and Fig. 2\ showing a sequence of steps of an embodiment 
of the metnpd according to the invention. 

A database, as shown in Fig. 1, comprises a number 
of data tables (Tables 1-5) . Each data table contains 
3 5 data values of a number of data variables. For example, 
in Table 1 each data record contains data values of the 
data variables "Product", "Price" and "Part". If there is 



wo 00/55766 




PCT/SEOO/00482 



7 

no specific value in a field of the data record, this 
field is considered to hold a NULL-value. Similarly, in 
Table 2 each data record contains values of the variables 
"Date", "Client", "^Product" and ^'Number" . Typically, the 
5 data values are stored in the form of ASCII -coded 
strings . 

The method according to the present invention is 
implemented by means of a computer program. In a first 
step (step 101) , the program reads all data records in 

10 the database, for instance using a SELECT statement which 
selects all the tables of the database, i.e. Tables 1-5 
in this case. Typically, the database is read into the 
primary memory of the computer. 

To increase the evaluation speed, it is preferred 

15 that each unique value of each data variable in said 

database is assigned a different binary code and that the 
data records are stored in binary- coded form (step 101) . 
This is typically done when the program first reads the 
data records from the database. For each input table, the 

20 following steps are carried out. First the column names, 
i.e. the variables, of the table are successively read. 
Every time a new data variable appears, a data structure 
is instantiated for it. Then, an internal table structure 
is instantiated to contain all the data records in binary 

2 5 form, whereupon the data records are successively read 

and binary-coded. For each data value, the data structure 
of the corresponding data variable is checked to estab- 
lish if the value has previously been assigned a binary 
code. If so, that binary code is inserted in the proper 

30 place in the above-mentioned table structure. If not, the 
data value is added to the data structure and assigned a 
new binary code, preferably the next one in ascending 
order, before being inserted in the table structure. In 
other words, for each data variable, a unique binary code 

35 is assigned to each unique data value. 



wo 00/55766 



PCT/SEOO/00482 




8 

Tablelte 6-12 of Appendix A show the binary codes 
^ assigned t<A different data values of some data variables 

that are inaluded in the database of Fig. 1. 

After having read all data records in the database, 
5 the program analyses the database to identify all connec- 
tions between the data tables (step 102) . A connection 
between two data tables means that these data tables have 
one variable in common. Different algorithms for perform- 
ing such an analysis are known in the art. After the 
10 analysis all data tables are virtually connected. In Fig. 
1, such virtual connections are illustrated by double- 
ended arrows (a) . The virtually connected data tables 
should form at least one so-called snowflake structure, 
i.e. a branching data structure in which there is one and 
15 only one connecting path between any two data tables in 
the database. Thus, a snowflake structure does not con- 
tain any loops. If loops do occur among the virtually 
connected data tables, e.g. if two tables have more than 
one variable in common, a snowflake structure can in some 

2 0 cases still be formed by means of special algorithms 
known in the art for resolving such loops . 

After this initial analysis, the user can start to 
explore the database. In doing so, the user defines a 
mathematical function, which could be a combination of 
25 mathematical expressions (step 103) . Assume that the user 
wants to extract the total sales per year and client from 
the database in Fig. 1. The user defines a corresponding 
mathematical function "SUM(x*y) " , and selects the calcu- 
lation variables to be included in this function: ''Price" 
30 and ''Number" . The user also selects the classification 
variables: "Client" and "Year". 

The computer program then identifies all relevant 
data tables (step 104), i.e. all data tables containing 
any one of the selected calculation and classification 

3 5 variables, such data tables being denoted boundary 
tables, as well as all intermediate data tables in the 
connecting path(s) between these boundary tables in the 
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snowflake structure, such data tables being denoted 
connecting tables. For the sake of clarity, the group of 
relevant data tables (Tables 1-3) is included in a first 
frame (A) in Fig. 1. Evidently, there are no connecting 
5 tables in this particular case. 

In the present case, all occurrences of every value, 
i.e. frequency data, of the selected calculation 
variables must be included for evaluation of the mathe- 
matical function. In Fig. 1, the selected variables 

10 (''Price", "Number") requiring such frequency data are 

indicated by bold arrows (b) , whereas remaining selected 
variables are indicated by dotted lines (b')- Now, a 
subset (B) can be defined that includes all boundary 
tables (Tables 1-2) containing such calculation variables 

15 and any connecting tables between such boundary tables in 
the snowflake structure. It should be noted that the 
frequency requirement of a particular variable is deter- 
mined by the mathematical expression in which it is 
included. Determination of an average or a median calls 

2 0 for frequency information. In general, the same is true 

for determination of a sum, whereas determination of a 
maximum or a minimum does not require frequency data of 
the calculation variables. It can also be noted that 
classification variables in general do not require 
25 frequency data. 

Then, a starting table is elected, preferably among 
the data tables within subset (B) , most preferably the 
data table with the largest number of data records in 
this subset (step 105) . In Fig. 1, Table 2 is elected as 

3 0 the starting table. Thus, the starting table contains 

selected variables ("Client", "Number"), and connecting 
variables ("Date", "Product"). These connecting variables 
link the starting table (Table 2) to the boundary tables 
(Tables 1 and 3) . 
3 5 There^after, a conversion structure is built (step 

106), as sliown in Tables 13 and 14. This conversion 
structure is used for translating each value of each 
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connecting variable ("Date", "Product") in the starting 
table Viable 2) into a value of a corresponding selected 
variable. ("Year", "Price") in the boundary tables (Table 
3 and 1 , \respectively) . Table 13 is built by successively 
5 reading dkta records of Table 3 and creating a link 
between each unique value of the connecting variable 
("Date") and a corresponding value of the selected var- 
iable ("YeaJT"). It can be noted that there is no link 
from value A ("Date: 1999-01-12"), since this value is 
10 not included Vn the boundary table. Similarly, Table 14 
is built by siJccessively reading data records of Table 1 
and creating a\link between each unique value of the 
connecting variable ("Product") and a corresponding value 
of the selected Wriable ("Price"). In this case, value 2 
15 ("Product: Toothpaste") is linked to two values of the 
selected variable \ "Price : 6.5"), since this connection 
occurs twice in the\ boundary table. Thus, frequency data 
is included in the cbnversion structure. Also note that 
there is no link frot\ value 3 {"Product: Shampoo") . 
2 0 When the conversion structure has been built, a 

virtual data record is created. Such a virtual data 
record, as shown in Table 15, accommodates all selected 
variables ("Client", "Year", "Price", "Number") in the 
database. In building the virtual data record (step 107- 
2 5 108) , a data record is first read from the starting table 
(Table 2) . Then, the value of each selected variable 
("Client", "Number") in the current data record of the 
starting table is incorporated in the virtual data 
record. Also, by using the conversion structure (Tables 
30 13-14) each value of each connecting variable ("Date", 
"Product") in the current data record of the starting 
table is converted into a value of a corresponding 
selected variable ("Year", "Price"), this value also 
being incorporated in the virtual data record . 
35 At this stage (step 109) , the virtual data record is 

used to build an intermediate data structure (Table 16) . 
Each data record of the intermediate data structure 
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accommodates each selected classification variable 
(dimension) and an aggregation field for each mathemati- 
cal expression implied by the mathematical function. The 
intermediate data structure (Table 16) is built based on 
5 the values of the selected variables in the virtual data 
record. Thus, each mathematical expression is evaluated 
based on one or more values of one or more relevant 
calculation variables in the virtual data record, and the 
result is aggregated in the appropriate aggregation field 

10 based on the combination of current values of the classi- 
fication variables (^^Client", "Year"). 

The above procedure is repeated for all data records 
of the starting table (step 110) . Thus, an intermediate 
data structure is built by successively reading data 

15 records of the starting table, by incorporating the 

current values of the selected variables in a virtual 
data record, and by evaluating each mathematical 
expression based on the content of the virtual data 
record. If the current combination of values of classifi- 

2 0 cation variables in the virtual data record is new, a new 
data record is created in the intermediate data structure 
to hold the result of the evaluation. Otherwise, the 
appropriate data record is rapidly found, and the result 
of the evaluation is aggregated in the aggregation field. 

2 5 Thus, data records are added to the intermediate data 

structure as the starting table is traversed. Preferably, 
the intermediate data structure is a data table associa- 
ted with an efficient index system, such as an AVL or a 
hash structure. In most cases, the aggregation field is 

30 implemented as a summation register, in which the result 
of the evaluated mathematical expression is accumulated. 
In some cases, e.g. when evaluating a median, the 
aggregation field is instead implemented to hold all 
individual results for a unique combination of values of 

35 the specified classification variables. It should be 

noted that only one virtual data record is needed in the 
procedure of building the intermediate data structure 
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from the starting table. Thus, the content of the virtual 
data record is updated for each data record of the 
starting table. This will minimise the memory requirement 
in executing the computer program. 
5 The procedure of building the intermediate data 

structure will be further described with reference to 
Tables 15-16. In creating the first virtual data record 
Rl, as shown in Table 15, the values of the selected 
variables "Client" and "Number" are directly taken from 

10 the first data record of the starting table (Table 2). 

Then, the value "1999-01-02" of the connecting variable 
"Date" is transferred into the value "1999" of the 
selected variable "Year" , by means of the conversion 
structure (Table 13) . Similarly, the value "Toothpaste" 

15 of the connecting variable "Product" is transferred into 
the value "6.5" of the selected variable "Price" by means 
of the conversion structure (Table 14) , thereby forming 
the virtual data record Rl . Then, a data record is 
created in the intermediate data structure, as shown in 

20 Table 16. In this case, the intermediate data structure 

has tree columns, two of which holds selected classifica- 
tion variables ("Client", "Year"). The third column holds 
an aggregation field, in which the evaluated result of 
the mathematical expression ( "x*y" ) operating on the 

25 selected calculation variables ("Number", "Price") is 
aggregated. In evaluating virtual data record Rl , the 
current values (binary codes: 0,0) of the classification 
variables are first read and incorporated in this data 
record of the intermediate data structure. Then, the 

30 current values (binary codes: 2,0) of the calculation 
variables are read. The mathematical expression is 
evaluated for these values and added to the associated 
aggregation field . 

Next, the virtual data record is updated based on 

35 the starting table. Since the conversion structure (Table 
14) indicates a duplicate of the value "6.5" of the 
selected variable "Price" for the value "Toothpaste" of 
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the connecting variable '^Product" , the updated virtual 
data record R2 is unchanged and identical to Rl . Then, 
the virtual data record R2 is evaluated as described 
above. In this case, the intermediate data structure con- 
5 tains a data record corresponding to the current values 
(binary codes: 0,0) of the classification variables. 
Thus, the evaluated result of the mathematical expression 
is accumulated in the associated aggregation field. 

Next, the virtual data record is updated based on 

10 the second data record of starting table. In evaluating 

this updated virtual data record R3 , a new data record is 
created in the intermediate data structure, and so on. 

It should be noted that NULL values are represented 
by a binary code of -2 in this example. In the illustra- 

15 ted example, it should also be noted that any virtual 

data records holding a NULL value (-2) of any one of the 
calculation variables can be directly eliminated, since 
NULL values can not be evaluated in the mathematical 
expression ("x*y") . It should also be noted that all NULL 

20 values (-2) of the classification variables are treated 
as any other valid value and are placed in the inter- 
mediate data structure. 

After traversing the starting table, the inter- 
mediate data structure contains four data records, each 

2 5 including a unique combination of values (0,0; 1,0; 2,0; 

3,-2) of the classification variables, and the correspon- 
ding accumulated result (41; 37.5; 60; 75) of the evalua- 
ted mathematical expression. 

Preferably, the intermediate data structure is also 
30 processed to eliminate one or more classification 

variables (dimensions) . Preferably, this is done during 
the process of building the intermediate data structure, 
as described above. Every time a virtual data record is 
evaluated, additional data records are created, or found 

3 5 if they already exist, in the intermediate data struc- 

ture. Each of these additional data records is destined 
to hold an aggregation of the evaluated result of the 
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mathematical expression for all values of one or more 
classification variables. Thus, when the starting table 
has been traversed, the intermediate data structure will 
contain both the aggregated results for all unique 
5 combinations of values of the classification variables, 
and the aggregated results after elimination of each 
relevant classification variable. 

This procedure of eliminating dimensions in the 
intermediate data structure will be further described 

10 with reference to Tables 15 and 16. When virtual data 
record Rl is evaluated (Table 15) and the first data 
record (0,0) is created in the intermediate data struc- 
ture, additional data records are created in this struc- 
ture. Such additional data records are destined to hold 

15 the corresponding results when one or more dimensions are 
eliminated. In Table 16, a classification variable is 
assigned a binary code of -1 in the intermediate data 
structure to denote that all values of this variable are 
evaluated. In this case, three additional data records 

2 0 are created, each holding a new combination of values 

(-1,0; 0,-1; -1,-1) of the classification variables. The 
evaluated result is aggregated in the associated aggrega- 
tion field of these additional data records. The first 
(-1,0) of these additional data records is destined to 

25 hold the aggregated result for all values of the classi- 
fication variable '^Client" when the classification 
variable "Year" has the value "1999". The second (0,-1) 
additional data record is destined to hold the aggregated 
result for all values of the classification variable 

30 "Year" when the classification variable '"Client" is 
''Nisse" . The third (-1,-1) additional data record is 
destined to hold the aggregated result for all values of 
both classification variables '"Client" and "Year" . 

When virtual data record R2 is evaluated, the result 

3 5 is aggregated in the aggregation field associated with 

the current combination of values (binary codes: 0,0) of 
the classification variables, as well as in the aggre- 
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gation fields associated with relevant additional data 
records (binary codes: -1,0; 0,-1; -1,-1). When virtual 
data record R3 is evaluated, the result is aggregated in 
the aggregation field associated with the current combi- 
5 nation of values (binary codes: 1,0) of the classifica- 
tion variables. The result is also aggregated in the 
aggregation field of a newly created additional data 
record (binary codes : 1,-1) and in the aggregation fields 
associated with relevant existing data records (binary 

10 codes: -1,0; -1,-1) in the intermediate data structure. 

After traversing the starting table, the inter- 
mediate data structure contains eleven data records, as 
shown in Table 16. 

Preferably, if the intermediate data structure 

15 accommodates more than two classification variables, the 
intermediate data structure will, for each eliminated 
classification variable, contain the evaluated results 
aggregated over all values of this classification 
variable for each unique combination of values of remain- 

20 ing classification variables, 

X^en the intermediate data structure has been built, 
a f ina^ data structure, i.e. a multidimensional cube, as 
shown iri non-binairy notation in Table 17, is created by 
evaluating the mathematical function ("SUM(x*y)") based 

25 on the resiilts of the mathematical expression ( ^^x*y" ) 

contained rn the intermediate data structure (step 111) . 
In doing so,\ the results in the aggregation fields for 
each unique cbmbination of values of the classification 
variables are Combined. In the illustrated case, the 

30 creation of thA final data structure is straightforward, 
due to the triviVl nature of the present mathematical 
function. The content of the final data structure might 
then (step 112) bA presented to the user in a two- 
dimensional table, ^s shown in Table 18. Alternatively, 

35 if the final data structure contains many dimensions, the 
data can be presented in a pivot table, in which the user 
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interactively can move up and down in dimensions/ as is 
well known imthe art. 

B^ow, a second example of the inventive method is 
described with reference to Tables 20-29. The description 
will only elaborate on certain aspects of this example, 
namely biMlding a conversion structure including data 
from connecting tables, and building an intermediate data 
structure \for a more complicated mathematical function. 
In this example, the user wants to extract sales data per 
10 client from\a database, which contains the data tables 
shown in Tables 20-23. For ease of interpretation, the 
binary coding\ is omitted in this example. 

The user has specified the following mathematical 
functions, for which the result should be partitioned per 
15 Client: 

a) ^^IF (Only (Environment index) =='!') THEN 

Sum (Number* Price ) *2 , ELSE Sum (Number* Price) )" , and 

b) "Avg (Number* Price) " 

The mathematical function (a) specifies that the 
2 0 sales figures should be doubled for products that belong 
to a product group having an environment index of ^ I ' , 
while the actual sales figures should be used for other 
products. The mathematical function (b) has been included 
for reference - 

25 In this case, the selected classification variables 

are "Environment index" and ''Client", and the selected 
calculation variables are ''Number" and "Price" . Tables 
20, 22 and 23 are identified as boundary tables, whereas 
Table 21 is identified as a connecting table. Table 20 is 

30 elected as starting table. Thus, the starting table 

contains selected variables ("Number", "Client"), and a 
connecting variable ("Product"). The connecting variable 
links the starting table (Table 20) to the boundary 
tables (Tables 22-23), via the connecting table (Table 

35 21) . 

Neyt, the formation of the conversion structure will 
be described with reference to Tables 24-26. A first part 
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(Table 24) of the conversion structure is built by 
successively reading data records of a first boundary 
table VTable 23) and creating a link between each unique 
value oV the connecting variable (''Product group"") and a 
5 corresponding value of the selected variable ("Environ- 
ment index") . Similarly, a second part (Table 25) of the 
conversiom structure is built by successively reading 
data recorqs of a second boundary table (Table 22) and 
creating a Aink between each unique value of the connec- 

10 ting variable (''Price group") and a corresponding value 

of the selected variable ("Price") . Then, data records of 
the connect ing\ table (Table 21) is read successively. 
Each value of mie connecting variables ("Product group" 
and "Price grouV' , respectively) in Tables 24 and 25 is 

15 substituted for a corresponding value of a connecting 

variable ("Product") in Table 21. The result is merged in 
one final conversi)pn structure, as shown in Table 26. 

Then, an intermediate data structure is built by 
successively reading data records of the starting table 

20 (Table 20) , by using the conversion structure (Table 26) 
to incorporate the current values of the selected vari- 
ables ("Environment index", "Client", "Number", "Price") 
in the virtual data record, and by evaluating each mathe- 
matical expression based on the current content of the 

2 5 virtual data record. 

For reasons of clarity. Table 2 7 displays the 
corresponding content of the virtual data record for each 
data record of the starting table. As noted in connection 
with the first example, only one virtual data record is 

3 0 needed. The content of this virtual data record is 

updated, i.e. replaced, for each data record of the 
starting table. 

Each data record of the intermediate data structure, 
as shown in Table 28, accommodates a value of each 
35 selected classification variable ("Client", "Environment 
index") and an aggregation field for each mathematical 
expression implied by the mathematical functions. In this 
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case, the intermediate data structure contains two 
aggregation fields. One aggregation field contains the 
aggregated result of the mathematical expression ("'x*y") 
operating on the selected calculation variables 
5 ('^Number", "Price") / as well as a counter of the number 
of such operations. The layout of this aggregation field 
is given by the fact that an average quantity should be 
calculated ( ^'Avg (x*y) " ) . The other aggregation field is 
designed to hold the lowest and highest values of the 

10 classification variable "Environment index" for each 

combination of values of the classification variables. 

As in the first example, the intermediate data 
structure (Table 28) is built by evaluating the 
mathematical expression for the current content of the 

15 virtual data record (each row in Table 27) , and by 

aggregating the result in the appropriate aggregation 
field based on the combination of current values of the 
classification variables ("Client", "Environment index"). 
The intermediate data structure also includes data 

2 0 records in which the value "<ALL>" has been assigned to 
one or both of the classification variables. The corres- 
ponding aggregation fields contain the aggregated result 
when the one or more classification variables (dimen- 
sions) are eliminated, 

25 When the intermediate data structure has been built, 

a final data structure, i.e. a multidimensional cube, is 
created by evaluating the mathematical functions based on 
the evaluated results of the mathematical expressions 
contained in the intermediate data structure. Each data 

30 record of the final data structure, as shown in Table 29, 
accommodates a value of each selected classification 
variable ("Client", "Environment index") and an aggre- 
gation field for each mathematical function selected by 
the user. 

35 The final data structure is built based on the 

results in the aggregation fields of the intermediate 
data structure for each unique combination of values of 
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the classification variables. When function (a) is 
evaluated, by sequentially reading data records of Table 
28, the program first checks if both values in the last 
column of Table 28 is equal to ^I'. If so, the relevant 
5 result contained in the first aggregation field of Table 
28 is multiplied by two and stored in Table 29. If not, 
the relevant result contained in the first aggregation 
field of Table 28 is directly stored in Table 29. When 
function (b) is evaluated, the aggregated result of the 

10 mathematical expression ( "x*y" ) operating on the selected 
calculation variables ("Number", "Price") is divided by 
the number of such operations, both of which are stored 
in the first aggregation field of Table 28. The result is 
stored in the second aggregation field of Table 29. 

15 Evidently, the present invention allows the user to 

freely select mathematical functions and incorporate 
calculation variables in these functions as well as to 
freely select classification variables for presentation 
of the results. 

20 As an alternative, albeit less memory-efficient, to 

the illustrated procedure of building an intermediate 
data structure based on sequential data records from the 
starting table, it is conceivable to first build a so- 
called join table. This join table is built by traversing 

25 all data records of the starting table and, by use of the 
conversion structure, converting each value of each 
connecting variable in the starting table into a value of 
at least one corresponding selected variable in a 
boundary table. Thus, the data records of the join table 

3 0 will contain all occurring combinations of values of the 
selected variables. Then, the intermediate data structure 
is built based on the content of the join table. For each 
record of the join table, each mathematical expression is 
evaluated and the result is aggregated in the appropriate 

35 aggregation field based on the current value of each 

selected classification variable. However, this alter- 
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native procedure requires more computer memory to extract 
the requested information. 

It should be realised that the mathematical function 
could contain mathematical expressions having different, 
5 and conflicting, needs for frequency data. In this case, 
steps 104-110 (Fig. 2) are repeated for each such 
mathematical expression, and the results are stored in 
one common intermediate data structure. Alternatively, 
one final data structure, i.e. multidimensional cube, 
10 could be built for each mathematical expression, the 

contents of these cubes being fused during presentation 
to the user. 
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A ppendix A 



Table 7 



Table 6 



Product 




Soap 


0 


Soft soap 


1 


Toothpaste 


2 


Shampoo 


3 



Price 




7.5 


0 


9.35 


1 


6.5 


2 


Table 9 



Table 8 



Date 




1999-01-02 


0 


1999-01-07 


1 


1999-01-08 


2 


1999-01-11 


3 


1999-01-12 


4 


1999-01-15 


5 



Client 




Nisse 


0 


Gullan 


1 


Kalle 


2 


Pekka 


3 


Jens 


4 



Table 11 



Year 



1999 



Table 10 



Number 




3 


0 


5 


1 


8 


2 


2 


3 


10 


4 



Table 12 



Country 




Sweden 


0 


Denmark 


1 


Finland 


2 
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Table 13 

0\ 



iy 



21 — 

31 — H51 
51 — Kol 



Table 14 
Product — ► Price 

01 — 
1\ — KH 



2h 



rab/e 15 Table 16 




dierrt Year Piice tbmixr aient Year E mbei*Price 



0 






0 










0 


0 




0 










RiO 


0 


2 


0 




0 


0 


Si//n=Su/n(0j+3x6.5->20.5 


R20 


0 


2 


0 




0 


0 


Sum=Sumf20.^+3x6.5->4f 


P3i 


0 


0 


1 




1 


0 


Siim=Siim(0j+5jf 7.5 •> 37.5 


Ri2 


0 


0 


2 




2 


0 


Si//n=Sum(0j+8x7.5->60 


P52 


0 


■2 


3 




2 


0 


Sm=Summ^2x(Nm'>^ 


R63 


■2 


0 


4 




3 


•2 


Sm=Sm((^+10x7.5->75 


R70 


0 


0 


3 




0 


0 


Sl/m=Su/nf4fj+2x(A/ULg->4f 





0 


Suin=Sm(OI +20.5 +20,5 +37.5+ 60 +0+M38.5 


.f 


•2 


Sum=Sw/n(0)+75->75 


0 


■1 


Sum=Su/nfOj +20.5 + 20.5 + 0->4t 


1 


■f 




2 


-t 




3 


■f 


Su/n=Su/nfOj+ 75 -> 75 



•1 



■1 Su/n=Su/nfOj+20.5 + 20.5 + 37.5 + 60 + 0 + 75 + 0-> 2f3.5 
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Table 17 



CImnt 


Year Sum (Number 


Nisse 


1999 


41 


Gullan 


1999 


37.5 


Kalle 


1999 


60 


Pekka 


<NULL> 


75 


<ALL> 


1999 


138.5 


<ALL> 


<NULL> 


75 


Nisse 


<ALL> 


41 


Gullan 


<ALL> 


37.5 


Kalle 


<ALL> 


60 


Pekka 


<ALL> 


75 


<ALLA 


<ALL> 


213.5 



Table 20 



Table 18 

Sum (Price*Number) Per Client,Year 





1999 


<NULL> 


<ALL> 


Nisse 


41 




41 


Gullan 


37.5 




37.5 


Kalle 


60 




60 


Pekka 




75 


75 


<ALL> 


138.5 


75 


75 



Date 


Product 


Number 


Client 


1998-12 -20 


B 


5 


Nisse 


1999-02-05 


A 


7 


Kalle 


1999-02-06 


B 


9 


Kalle 



Table 21 



Product 


Price group 


Product group 


A 


4 


Z 


B 


3 


X 



Table 22 



Table 23 



Price group 


Price 


3 


5.5 


4 


3.5 



Product group 


Environment index 


Legal status 


X 


1 


YES 


Y 


IX 


NO 


Z 


II 


YES 
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Table 24 Table 25 

Product group->Environment index Price group ->Price 

X ► / 3 ► 5.5 

Y ► IX 4 ► 3.5 

^ ^ " Table 26 

Product->Price, Environment index 

A ► 3.5,11 

B ► 5.5,1 



Table 27 



Client 


Environment index 


Number 


Price 


Nisse 


1 


5 


5.5 


Kalle 


II 


7 


3.5 


Kalle 


1 


9 


5.5 



Table 28 



aient 


Envirxmm&ii index 


E-Nuniierx Price 


Z-Emnronm&rt index 


Nisse 


1 


Ix: 27.5. N: 1 


First: 1, Last: 1 


Kalle 


II 


Ix: 24.5, N: 1 


Hrst: II, Last: II 


Kalle 


1 


Ix: 49.5, N: 1 


Hrst: 1, Last: 1 


<ALL> 


1 


Ix:77, N:2 


First: 1. Last: 1 


<ALL> 


II 


Zx: 24.5, N: 1 


First: II, Last: II 


<ALL> 


<AL1> 


Zx: 101.5. N: 3 


Hrst: 1, Last: II 



Table 29 


IF(Oi^(En\/lmnmsntinde)^-l', 
SuniNunixi*PricenSuniNunTbei*Pri(x^^ 




aient 


Environment index 


AvgiNunt&fPhce) 


Nisse 


1 


55.0 


27.5 


Kalle 


II 


24.5 


24.5 


Kalle 


1 


99.0 


49.5 


<ALLA> 


1 


154.0 


38.5 


<ALLA> 


II 


24.5 


24.5 


<ALLA> 


<ALL> 


<NULL> 


33.8 



